layer.

# Simulation of PCI Express<sup>™</sup> Transaction Layer using Hardware Description Language

# V. Sudheer Raja\*, Dr. M. V. Raghavendra\*\*, G. Subbarao\*

\* Assistant professor, Adama Science and Technology University, Adama, Ethiopia \*\* Associate professor, Adama Science and Technology University, Adama, Ethiopia

| Article Info                                                           | ABSTRACT                                                                                                                                                                                                                                                                                                                                                      |
|------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Article history:                                                       | PCI Express is a high-speed serial connection that operates more like a                                                                                                                                                                                                                                                                                       |
| Received Nov 12, 2014<br>Revised Feb 20, 2015<br>Accepted Mar 26, 2015 | network than a bus. PCI Express will serve as a general purpose I/O interconnects for a wide variety of future computing and communications platforms. PCI Express (PCIe) is implemented with a split-transaction protocol that provides more bandwidth and is compatible with existing operating systems. PCI Express has three discrete logical layers: the |
| Keyword:                                                               | Transaction Layer, the Data Link Layer, and the Physical Layer. This paper<br>analyze and simulates the function of Transaction layer IP core in the                                                                                                                                                                                                          |
| Active HDL<br>IP core                                                  | System Level with top-down design method, wrote the codes to implement<br>Transaction Layer using Very high speed hardware description language<br>(VHDL) and provided the simulation results using Active HDL Simulation                                                                                                                                     |

IP core PCI express Transaction layer VHDL

> Copyright © 2015 Institute of Advanced Engineering and Science. All rights reserved.

tool. The simulation result shows that the designed IP core meets the required

protocol specifications for the proper functioning of PCI Express Transaction

#### **Corresponding Author:**

V. Sudheer Raja, Assistant professor, Adama Science and Technology University, Adama, Ethiopia. Email: sudheerrajav@yahoo.com

#### 1. INTRODUCTION

PCI Express is a high-speed serial connection that operates more like a network than a bus. PCI Express will serve as a general purpose I/O interconnects for a wide variety of future computing and communications platforms. Instead of one bus that handles data from multiple sources, PCIe has a switch that controls several point-to-point serial connections. Peripheral Component Interconnect (PCI) slots are such an integral part of a computer's architecture that most people take them for granted. For years, PCI has been a versatile, functional way to connect sound, video and network cards to a motherboard. Key PCI attributes, such as its usage model and software interfaces are maintained whereas its bandwidth-limiting, parallel bus implementation is replaced by a long-life, fully-serial interface. A split-transaction protocol is implemented with attributed packets that are prioritized and optimally delivered to their target.

The PCI Express Architecture is specified in three logical layers as shown in Figure 1. The PCI model with load-store architecture with a flat address space is maintained to provide compatibility to all existing applications and drivers. Transaction layer, Data link layer and physical layer form the basic architecture. Each of these layers proces the information or data being transmitted and received and are subdivided into two sections accordingly. PCI Express components communicate with each other through packets which carries the information. These Packets are generated in the Transaction and Data Link Layers. The physical layer transports packets between the link layers of two PCI Express agents.

The primary role of a link layer is to ensure reliable delivery of the packet across the PCI Express link. The transaction layer receives read and write requests from the software layer and creates request packets for transmission to the link layer. As the transmitted packets flow through the other layers, they are extended with additional information necessary to handle packets at those layers.



Figure 1. PCI Express<sup>™</sup> Architecture showing logical layers

#### 2. TRANSACTION LAYER

The transaction layer is the top Layer of PCI Express architecture. It is responsible for the assembly and disassembly of data Packets used for communication. The transaction layer receives read and write requests from the software layer and creates request packets for transmission to the link layer. These Data packets are called TLP's (Transaction layer packets). The Transaction Layer is also responsible for managing credit-based flow control for TLPs. All requests are implemented as split transactions and some of the request packets will need a response packet. Each packet has a unique identifier that enables response packets to be directed to the correct originator. TLP's supports either 32bit memory addressing or extended 64bit memory addressing.

The transaction layer supports four address spaces: it includes the three PCI address spaces memory, I/O, configuration and adds a Message Space to support all prior side-band signals, such as interrupts, powermanagement requests, resets, and so on, as in-band Messages. The basic use of each address space is shown in Table 1.

| Address Transaction Types<br>Space |                               | Purpose                                                                                                |  |  |  |  |
|------------------------------------|-------------------------------|--------------------------------------------------------------------------------------------------------|--|--|--|--|
| Memory                             | Read, Write                   | Transfer data to or from a location in the<br>system memorymap                                         |  |  |  |  |
| IO                                 | Read, Write                   | Transfer data to or from a location in the<br>system IO map                                            |  |  |  |  |
| Configuration                      | Read, Write                   | Transfer data to or from a location in the<br>configuration space of a PCI-compatible<br>device.       |  |  |  |  |
| Message                            | Baseline, Vendor-<br>specific | General in-band messaging and event<br>reporting (without consuming memory or IO<br>address resources) |  |  |  |  |

 Table 1. PCI Express Address Space and Transaction Types

# 2.1. Transaction Layer Packet Format

The Transaction Layer is responsible for managing credit-based flow control for TLPs. All requests are implemented as split transactions. Accesses to the four address spaces in PCI Express are accomplished using split-transaction requests and completions.

Three kinds of packets Posted, Non-Posted and Completion (Cpl) are responsible for transactions handled by the transaction layer. Each Transaction Layer Packet contains a three or four double word (12 or 16 byte) header. Figure 2 shows the complete view of a TLP Four double word format.

| Framing Sequence<br>(STP) Number |     | H    | Header  |     |        |     | at  | a    | Digest |     | LCRC  | Framing<br>(End) |      |               |              |  |  |
|----------------------------------|-----|------|---------|-----|--------|-----|-----|------|--------|-----|-------|------------------|------|---------------|--------------|--|--|
| 1                                |     |      |         |     |        | 1.  |     |      | 1-     | 1.  |       |                  |      |               | lalal.       |  |  |
| 7                                | 6 5 | 4 3  | 8 2 1 0 | 7   | 6 5 4  | 3   | 2   | 1 0  | 7      | 6   | 5 4   | 3 2              | 10   | 7654          | 3210         |  |  |
| R                                | Fmt |      | Туре    | R   | TC     |     | R   | 2    | T      | EP  | Attr  | R                |      | Length        |              |  |  |
|                                  | ij  | (Fie | ld in b | yte | es 4-7 | V   | ary | wit  | h      | rLF | o typ | e)               |      | Last DW<br>BE | lst DW<br>BE |  |  |
|                                  |     |      | (Fi     | .1  | d in h | vte |     | 8-11 | v      | ary | wit   | h TL             | P tv | ne)           |              |  |  |

Figure 2. Generic 4DW Header TLP Format

Included in the 3DW or 4DW header are two fields, Type and Format (Fmt), which define the format of the remainder of the header and the routing method to be used on the entire TLP as it moves between devices in the PCI Express topology.

## 2.2. Overview of TLP Header Information

As TLPs arrive at an ingress port, they are first checked for errors at both the physical and data link layers of the receiver. Assuming there are no errors, TLP routing is performed; basic steps include:

- a. The TLP header Type and Format fields in the first DWord are examined to determine the size and format of the remainder of the packet
- b. Depending on the routing method associated with the packet, the device will determine if it is the intended recipient; if so, it will accept (consume) the TLP. If it is not the recipient, and it is a multi-port device, it will forward the TLP to the appropriate egress port-subject to the rules for ordering and flow control for that egress port
- c. If it is neither the intended recipient nor a device in the path to it, it will generally reject the packet as an Unsupported Request (UR).

Table 2 below summarizes the encodings used in TLP header Type and Format fields. These two fields, used together, indicate TLP format and routing to the receiver.

| TLP                                     | FMT[1:0]                               | TYPE [4:0]                                     |
|-----------------------------------------|----------------------------------------|------------------------------------------------|
| Memory Read Request (MRd)               | 00 = 3DW, no data 01 = 4DW,<br>no data | 0 00 00                                        |
| Memory Read Lock Request<br>(MRdLk)     | 00 = 3DW, no data 01 = 4DW,<br>no data | 0 00 01                                        |
| Memory Write Request (MWr)              | 10 = 3DW, w/ data 11 = 4DW,<br>w/ data | 0 00 00                                        |
| IO Read Request (IORd)                  | 00 = 3DW, no data                      | 00010                                          |
| IO Write Request (IOWr)                 | 10 = 3DW, w/ data                      | 0 0010                                         |
| Config Type 0 Read Request<br>(CfgRd0)  | 00 = 3DW, no data                      | 0 01 00                                        |
| Config Type 0 Write Request<br>(CfgWr0) | 10 = 3DW, w/ data                      | 0 01 00                                        |
| Config Type 1 Read Request<br>(CfgRd1)  | 00 = 3DW, no data                      | 0 01 01                                        |
| Config Type 1 Write Request<br>(CfgWr1) | 10 = 3DW, w/data                       | 0 01 01                                        |
| Message Request (Mig)                   | 01 = 4DW, no data                      | 1 0 RRR* (for<br>RRR, see routing<br>subfield) |
| Message Request W/Data<br>(MsgD)        | 11 = 4DW, w/ data                      | 1 0 RRR* (for<br>RRR, see routing<br>subfield) |
| Completion (Cpl)                        | 00 = 3DW, no data                      | 0 10 10                                        |
| Completion W/Data (CpID)                | 10 = 3DW, w/ data                      | 0 10 10                                        |
| Completion-Locked (CplLk)               | 00 = 3DW, no data                      | 0 10 11                                        |
| Completion W/Data (CplDLk)              | 10 = 3DW, w/ data                      | 0 10 11                                        |

## Table 2. TLP Header Type and Format Field encoding

#### 2.3. Split Transaction Protocol

Accesses to the four address spaces in PCI Express are accomplished using split-transaction requests and completions. The split transaction protocol is an improvement over earlier bus protocols (e.g. PCI) which made extensive use of bus wait-states or delayed transactions (retries) to deal with latencies in accessing targets.

In PCI Express, the completion following a request is initiated by the completer only when it has data and/or status ready for delivery. The fact that the completion is separated in time from the request which caused it also means that two separate TLPs are generated, with independent routing for the request TLP and the Completion TLP. Note that while a link is free for other activity in the time between a request and its

subsequent completion, a split-transaction protocol involves some additional overhead as two complete TLPs must be generated to carry out a single transaction.

#### 2.4. Virtual Channel (VC) and Transaction ordering

The VC mechanism provides support for carrying, throughout the fabric, traffic that is differentiated using TC labels. The foundation of VCs is independent fabric resources (queues/buffers and associated control logic). These resources are used to move information across Links with fully independent flow-control between different VCs. Traffic is associated with VCs by mapping packets with particular TC labels to their corresponding VCs. Table 1 defines the ordering requirements for PCI Express Transactions. The rules defined in this table apply uniformly to all types of Transactions on PCI Express including Memory, I/O, Configuration, and Messages. The ordering rules defined in this table apply within a single Traffic Class (TC). There is no ordering requirement among transactions with different Virtual Channels since transactions with the same TC label are not allowed to be mapped to multiple VCs on any PCI Express Link. For Table 3, the columns represent a first issued transaction and the rows represent a subsequently issued transaction. The table entry indicates the ordering relationship between the two transactions.

In order to obtain higher efficiency, Flow Control (FC) is used to prevent overflow of Receiver buffers and to enable Compliance with the ordering rules, there are six types of information tracked by Flow Control for each Virtual Channel: Posted Header (PH), Posted Data (PD), Non-Posted Header (NPH), Non-Posted Data(NPD), Completion Header (CplH), Completion Data (CplD), Each Virtual Channel maintains an independent Flow Control credit pool. The unit of Flow Control credit is 4 DW for data.

|                   |                                                          | Posted Request                                | Non-Po                     | sted Request                                        | Completion                    |                                                         |  |
|-------------------|----------------------------------------------------------|-----------------------------------------------|----------------------------|-----------------------------------------------------|-------------------------------|---------------------------------------------------------|--|
| Row               | Pass Column?                                             | Memory Write or<br>Message Request<br>(Col 2) | Read<br>Request<br>(Col 3) | I/O or<br>Configuration<br>Write Request<br>(Col 4) | Read<br>Completion<br>(Col 5) | LO or<br>Configuratio<br>Write<br>Completion<br>(Col 6) |  |
| Posted<br>Request | Memory Write<br>or Message<br>Request<br>(Row A)         | a) No<br>b) Y/N                               | Yes                        | Yes                                                 | a) YIN<br>b) Yes              | a) Y/N<br>b) Yes                                        |  |
| 3.                | Read Request<br>(Row B)                                  | No                                            | YIN                        | Y/N                                                 | YN                            | YIN                                                     |  |
| Non-Pos<br>Reque  | LO or<br>Configuration<br>Write Request<br>(Row C)       | No                                            | YIN                        | YN                                                  | YIN                           | YIN                                                     |  |
| 8                 | Read<br>Completion<br>(Row D)                            | a) No<br>bjY/N                                | Yes                        | Yes                                                 | a) Y/N<br>b) No               | YN                                                      |  |
| Complete          | LO or<br>Configuration<br>Write<br>Completion<br>(Row E) | YN                                            | Yes                        | Yes                                                 | YIN                           | YIN                                                     |  |

Table 3. Ordering rules summary

# 3. SYSTEM ARCHITECTURE AND DESIGN

Receive layer, Configuration space and Transmit layer are the three modules that form the basic architecture of PCI express. Every receive layer consists of eight VC's VC<sub>0</sub>- VC<sub>7</sub> among them VC<sub>0</sub> is necessary while others are optional. The basic architecture of PCI express is shown in Figure 3.

Each VC is sub divided into two modules: Receive buffer and Controller. Each receive buffer is subdivided into six parts in order to store PH, PD, NPH, NPD, CplH, and CplD respectively.

Header stores no more than 128 credits data stored no more than 2047 credits. The size of credits is indicated by FC credits. Controller implement the function of FIFO, FC and Sequencing (ordering). There must be sequence strictly to the packet of Posed and Cpl ,according to the rule of transmit ordering.

Transmit Layer added VC Arbitration and Digest module to compare with Receive layer. Transmit Layer is not only ordering for Posed and Cpl packet, but also detect Flow Control.

- a. Interface Signal to Software Layer and handshake Sequence rules: desc\_n[127:0] indicate the VCn transmit header and data; data\_n[127:0] indicate the VCn Receive header and data. Rules of Handshake in Header File transmit and receive: the sender set req\_n to indicate RTS (Request To Send), the Receiver receives the request then set ack\_n; Rules of Handshake in Data file transmit and receive: the sender set dfr\_n to indicate the beginning of data transmit, and set ws\_n to inform Receiver wait when Receiver is busy in the data transmit process, next data will be send until ws\_n clear, it is time to clear dfr\_n which inform receiver data send finished when the last 1DW data was sent.
- b. Interface Signal to Data Link Layer and handshake Sequence rules: desc\_data [31:0] are TLP signal which come from Data Link Layer, The corresponding part of Frame represent frame header. BlockT

indicate Replay state of Data Link Layer, and also indicate to pause the transmit TLP. FC credits n indicate the FC credits value which includes cred\_alloc\_p, cred\_alloc\_np, cred\_alloc\_cpl, cl\_p, cl\_np, cl\_cpl that needed transmit or refresh. vc\_reqn is set when VCn needed transmit data to DLL, vc\_ack\_n is set when Arbitrator permit this request, when VC received vc\_ack\_n, then set vc\_get\_n. When last data was sent vc\_get\_n will be cleared at the same time, and then inform Arbitrator to end the VC permission. Its configure register include conf\_addr, conf\_data\_i, conf\_data\_o, conf\_wr, conf\_rd are configure address, data in , data out, read and write signal respectively, all of them are used to configure Arbitrator configure VC arbitrator mode.



Figure 3. Transaction layer architecture

# 4. TRANSMITTER AND RECEIVER FLOW CHART

The transmitter flowchart is shown in Figure 4. The only different between transmitter flowchart and receiver flowchart is the former needed to add in flow control, so pnpc\_rd also depends on the flow control. if c\_ph[7] is 1shows the other side is already full.cl\_p stands for CredAlloc of P; and cc\_p stands for Credits Consumed of P.Flowchart (d) is virtual channel flow. In the initial state, if the data link layer is not block and vc\_req is valid, now\_vc value will update to next\_vc value, and choose next channel.

The selection of channel is decision by arbitrate mode. Unnecessary vc may be chosen owing to the arbitrator support look-up table, so we must to judge whether the vc\_req\_n of this channel is 1 or not at this moment vc\_get\_n will clear when the virtual channel data transmission the last 1DW, and let vc\_arb return to initial state. Before the arbitration we should configure VC registers in order to choose the mode of arbitration.



Figure 4. Transmitter Flow Chart

Figure 5. Receiver Flow Chart

Figure 5 shows Receiver flowchart, there are four sub-flowcharts: (a) is read flowchart, in this flowchart it is first to judge pnpc\_rd is read\_P, read\_NP or read\_Cpl TLPs at initial state, if pnpc\_rd is 3'b100 then set read\_p to start flowchart (b) and (c) which transmit PH and PD.Length register will record the remainder data in order to judge whether the data completely transmit. When flow chart (b) and (c) are finished, Rx\_rd\_st will return to initial state. When Data Link Layer transmit packets, Frame Synchronous signals will send, flowchart (d) start to receive data at the same time, St1 will judge next state is P, NP or Cpl

TLP by judge pnpc\_wr. Then the header and data shall be written. Length Register is to judge whether data are completely transmission. Write NP, Cpl TLPs and P TLPs roughly as the same.

#### 5. SIMULATION RESULTS

Transmitter and Receiver flowcharts are simulated for their functional verification using Active HDL Simulation tool. The figure 6 and figure 7 shows the simulation results for receiver and transmitter virtual channels respectively.

In figure 6 the different transactions made by the receiver virtual channel is divided by a vertical line at every 50ns of time, for example 0 to 50ns line indicates the initialization of various buffers and the line from 350ns to 400ns indicates the transaction of receiving data from data link layer.

The Figure 7 shows various transactions made by the transmitter virtual channel. For convenience every transaction is separated with a vertical line at every multiple of 50ns.Transactions include header and data read for posted, Non-posted and completion transactions.



Figure 6. Simulation diagram of Receiver Virtual channel



Figure 7. Simulation diagram of Transmitter Virtual channel

#### 6. CONCLUSION

In this paper we presented the method of implementing the widely adopted higher performance PCI Express interconnect transaction layer with a top down design method, and wrote the VHDL codes to implement it. The simulation results show that the design can achieve the basic function of PCI Express Transaction Layer that meet the protocol of PCI Express<sup>TM</sup> Base Specification Revision 2.0. The same basic design methods can be used to improve the equipment transmission bandwidth by further implementing the PCI Express IP Core.

#### REFERENCES

- [1] The Intel Developer Web site http://www.intel.com/technology/3gio
- [2] The PCI-SIG Web site http://www.pcisig.com
- [3] PCI Express system architecture by Ravi Budruck, Don Anderson, Tom shanley Mind share international publications
- [4] PCI SIG. PCI-Express<sup>™</sup> base specification Rev 2.0. PCI SIG. 2006.12.20
- [5] PLD Applications, PCI Express Expert Core Reference Manual, 2006-02
- [6] V. Bhaskar, VHDL primer, Prentice Hall/Pearson, 2005.01